Confidence Intervals for the Pythagorean Formula in Baseball

نویسنده

  • David D. Tung
چکیده

In this paper, we will investigate the problem of obtaining confidence intervals for a baseball team’s Pythagorean expectation, i.e. their expected winning percentage and expected games won. We study this problem from two different perspectives. First, in the framework of regression models, we obtain confidence intervals for prediction, i.e. more formally, prediction intervals for a new observation, on the basis of historical binomial data for Major League Baseball teams from the 1901 through 2009 seasons, and apply this to the 2009 MLB regular season. We also obtain a Scheffé-type simultaneous prediction band and use it to tabulate predicted winning percentages and their prediction intervals, corresponding to a range of values for log(RS/RA). Second, parametric bootstrap simulation is introduced as a data-driven, computer-intensive approach to numerically computing confidence intervals for a team’s expected winning percentage. Under the assumption that runs scored per game and runs allowed per game are random variables following independent Weibull distributions, we numerically calculate confidence intervals for the Pythagorean expectation via parametric bootstrap simulation on the basis of each team’s runs scored per game and runs allowed per game from the 2009 MLB regular season. The interval estimates, from either framework, allow us to infer with better certainty as to which teams are performing above or below expectations. It is seen that the bootstrap confidence intervals appear to be better at detecting which teams are performing above or below expectations than the prediction intervals obtained in the regression framework.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Derivation of the Pythagorean Won-loss Formula in Baseball

It has been noted that in many professional sports leagues a good predictor of a team’s end of season won-loss percentage is Bill James’ Pythagorean Formula RSobs γ RSobs +RAobs γ , where RSobs (resp. RAobs) is the observed average number of runs scored (allowed) per game and γ is a constant for the league; for baseball the best agreement is when γ is about 1.82. This formula is often used in t...

متن کامل

Exact maximum coverage probabilities of confidence intervals with increasing bounds for Poisson distribution mean

 ‎A Poisson distribution is well used as a standard model for analyzing count data‎. ‎So the Poisson distribution parameter estimation is widely applied in practice‎. ‎Providing accurate confidence intervals for the discrete distribution parameters is very difficult‎. ‎So far‎, ‎many asymptotic confidence intervals for the mean of Poisson distribution is provided‎. ‎It is known that the coverag...

متن کامل

Hitting Is Contagious in Baseball: Evidence from Long Hitting Streaks

Data analysis is used to test the hypothesis that "hitting is contagious". A statistical model is described to study the effect of a hot hitter upon his teammates' batting during a consecutive game hitting streak. Box score data for entire seasons comprising [Formula: see text] streaks of length [Formula: see text] games, including a total [Formula: see text] observations were compiled. Treatme...

متن کامل

A confidence-aware interval-based trust model

It is a common and useful task in a web of trust to evaluate the trust value between two nodes using intermediate nodes. This technique is widely used when the source node has no experience of direct interaction with the target node, or the direct trust is not reliable enough by itself. If trust is used to support decision-making, it is important to have not only an accurate estimate of trust, ...

متن کامل

Confidence Intervals for Lower Quantiles Based on Two-Sample Scheme

In this paper, a new two-sampling scheme is proposed to construct appropriate confidence intervals for the lower population quantiles. The confidence intervals are determined in the parametric and nonparametric set up and the optimality problem is discussed in each case. Finally, the proposed procedure is illustrated via a real data set. 

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010